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1. Introduction 

A statement of statistical belief not uncommon in cosmic ray work is: "y° u need five 
sigmas to convince me". This has some justification, in that the history of cosmic 
rays contains many instances when a source or effect is claimed but not subsequently 
substantiated. Frequently this has been due to incorrect application of some statistical 
technique, often a failure to account fully for the 'degrees of freedom'. 

Most of the present body of statistical knowledge has been developed for specific 
problems, few of which occur in cosmic rays, although one of the most useful of texts[l|] 
was produced for experimental particle physicists. The analyser of cosmic ray data has 
particular problems: cosmic ray data requires great effort in collection and they are 
unlikely, once analysed, to be repeated. The numbers are frequently small and there are 
usually data missing and frequently there is significant contamination by noise. Ideally, 
a statistical measure should be developed specifically for each application. This is the 
only way in which all of the parameters of the experiment can be allowed for in the 
analysis. It is more usual for a general statistical tool to be applied, for example x 2 > 
which may not be optimal for the purpose, and for some experimental variables to be 
ignored. 

The focus of this review will be on methods of determining the presence of a signal 
rather than estimating some parameters of the data. The aim is to gather together 
the recent developments in methods of analysis of the temporal and spatial features of 
cosmic ray data, especially where the methods used are not 'traditional'. 

Several new methods have been published recently which depend on Bayesian ideas, 
and these ideas have been introduced before the description of the methods. 

2. On/Off Counts 

2.1. Introduction 

The subject of detecting the presence of a source in counting rate data, using off-source 
control data has appeared many times]]], KL 01 0]. Despite these numerous airings, 
erroneous statistical significances are occasionally still being published. In principle the 
question is easy to pose: if Non counts are detected when an instrument is pointed at 
a source and there is also a background counting rate, and N<jff counts are detected 
when it is collecting background counts only under otherwise identical conditions, what 
is the likelihood that there is a genuine source? 

A common treatment is to give for the significance of the excess counts: 



where a is the ratio of time on-source to time off-source, toN = atoFF- This is based 
on the supposition that the best estimate of the observed 'signal' is Non — cvNoff, 
the variances of the ON and OFF counts for a Poisson distribution are Nqn and 
Nqff and the variance of the difference between Nqn and ccNqff is the weighted 



N SIGMA 



Nqn — CiNoFF 



(1) 
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sum of the variances. The statistic used is Student's t which in the limit of large 
numbers is Gaussian. Since the distributions of N ON and Nqff are Poissonian, this 
expression should be used only if the numbers of events is sufficiently large for a Gaussian 
approximation to Poissonian to be valid. 

It is an example of only one type of statistic which could be used in ON/ OFF 
situations - a goodness- of- fit statistic to determine whether the observed data could 
have arisen from an a priori distribution. Other statistics could have been used, for 
example x 2 > which in this instance would have one degree of freedom. Asymptotically 
they should have the same result, that is they both should reject or accept the null 
hypothesis equally. In these tests the null hypothesis is that the observations were 
both samples from the same population and that any difference arose merely by chance. 
There is no explicit alternative hypothesis, but an implicit one: that if the difference 
between the counts was unlikely to be due to chance, it arose because of a genuine 
source, strength unspecified. 



2.2. Likelihood Analysis 

An optimal test exists for the intermediate case where there are two completely specified 
hypotheses: H : the null hypothesis as described above, and H\\ a hypothesis involving 
another model, usually including a specific 'signal'. In this rare (in cosmic rays) case, the 
Neyman/Pearson theorem shows that the likelihood ratio is optimal for any distribution 
function for the errors. 

In the more usual case, Hi is not fully specified, but has one or more free parameters. 
The null hypothesis H Q is that Noff and Non are both samples of the same population 
for which the source strength S = 0. The alternative hypothesis Hi is that Nqn contains 
an unknown source component, S > 0. In this case there is no optimal test, except that 
for errors of the exponential family, such as a Gaussian, the likelihood ratio is expected 
to be near-optimal. 

The problem was discussed at length twenty five years ago by 0'Mongain|| and 
Hearn0 but was not solved satisfactorily, at least in this field, until the maximum 
likelihood treatment of Gibson et al. Q and Dowthwaite et al.@ and later by a similar 
treatment by Li and Ma||. In these treatments the observed ON and OFF counts 
are due to (i) an unknown background B plus an unknown source S and (ii) the same 
unknown background B alone. The likelihood ratio is maximised with respect to the 
possible source counts: 

A= / P(N ON ,N OFF \S = 0) 

\P(N ON , Noff \ S = N ON ~ aN OFF ) 
a ( N ON + Nqff \ 
1 + oc V Nqn J 
A standard result is that the probability of obtaining a given A is obtained from 

-21n(A)~ X 2 (l) (3) 



1 + a 
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+ Noff \ 
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2.3. Comparison of methods 

Both equations [I] and ^ are valid asymptotically: only for large values of Noff and 
Non- Equation [l] assumes that the error distributions of Non and Noff are gaussian 
and equation § assumes that — 21og(A) is distributed as X2iX)- To check the region 
of validity, random data sets have been generated for each of a number of values of 
N ff- For each data set a has been set to 1.0 and a value of S3 = 3\J1B + S3 has been 
calculated, which is the 3a value of S = S3 assuming the validity of equation [l|. At 
each value of Noff, 10 6 data sets were generated using a poissonian random number 
algorithm 0. The fraction of samples N of Nqff where N > Nqff + S3 is used as 
an estimate of the true probability of obtaining N Q n = Nqff + S3 by chance. The 
results are shown in figure |] where both equations [I] and |] are shown to overestimate 
the probability near the 3a level almost equally likely, the former slightly less so. It is 
evident that, near the 3a level, there is little to choose and both equations are adequate 
for values of Noff and Non of a few hundred or more. Since good algorithms are 
available for Poissonian random number generation it is likely to be better to determine 
the probability of N Q n and Nqff for values less than ~ 100 using Monte Carlo methods 
tailored for the exact values of Nqn and Noff- 



3. Time Series 

3.1. Introduction 

Time series analysis has been the subject of very many books and articles and has been 
applied in very many fields. The term covers a wide range of concepts, including Change 
Point Analysis, Fourier Analysis and Trend Analysis. In cosmic ray studies, there are 
several areas of application, such as sidereal/solar effects on low energy cosmic rays on 
the ground, periodicity in data from point sources, either from satellite X- and 7-ray 



6- 
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logio(N F F ) 



Figure 1. Ratio of probabilities from equation [T] (full circles) and equation |] (full 
triangles) to Monte Carlo results for various values of Nqff and Nqn = Nqff + S3 
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data, or from ground-based Cerenkov detectors, and sporadic emission of a wide range 
of cosmic ray energies. In these cases, the raw data is usually in the form of time-tagged 
events. 



3.2. Bursts of Events 

This section will be concerned with the problem of deciding whether the counting rate 
of a detector has deviated from the expected rate due to a real outburst of events. 
The problem is usually most difficult in data comprising time-tagged events. An initial 
analysis could start with binning the data and looking for a deviation from the expected 
Poissonian distribution of the counts. One problem with this approach is that in the 
model of a single Poisson process generating the counts, each bin is independent, the 
experimenter often has the freedom to place the bins, both in position and width, 
arbitrarily. This alters the 'degrees of freedom' and experience suggests that more 
bursts have been 'detected' in the past than could have been justified from the data. 

The problem mentioned above is a specific one but in general most statistical 
problems associated with sporadic emission relate to the lack of a specific model for 
the form, duration and amplitude of emission, and the feeling is often that, given a 
free hand with the parameters, any pure noise series could made to disclose a 'burst'. 
A recent paper by Scargle|| suggests that existing methods for searching for rapid 
variability in A— ray and 7-ray astronomy do not fully extract all of the information 
contained in photon counts. The reasons given included 'binning fallacies', in that the 
data were widely binned and the size of the bins must be large enough to give 'good 
statistics'. Further, global methods such as autocorrelations and power spectra used 
on large data sets dilute the effects of sporadic bursts. The Bayesian response to these 
problems is discussed later. 

The problem at first sight does not seem insolvable using classical statistical theory. 
The statistical treatment of point processes: data occurring as points on the real line, or 
as discrete times, is covered by several texts, for example Cox and Isham||. The general 
treatment covers a variety of statistical processes, including Poisson (which is of most 
application here), doubly stochastic Poisson (where the average Poisson rate is itself a 
variable) and renewal processes where the distribution function for intervals between 
points is not exponential. In analysing data in the form of time-tagged photons without 
appreciable dead time, classical statistics would look for a powerful goodness-of-fit test 
of the pure Poisson process, if possible avoiding the loss of information and the arbitrary 
choices associated with binning. 

Given such a series of times, the problem posed here is: is there evidence for 
'bunching' or 'bursts' ? Alternatively, are the data consistent with a uniform distribution 
in time which, for events not affected by counter dead-time, would be governed by a 



pure Poisson process? Some recent papers such as McLaughlin et al.pO[ use just this 
assumption to classify sources into 'steady' or 'variable'. Others use ad hoc methods to 
estimate the probability of bursts [p"T| . 
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3.2.1. The Scan Statistic The test statistic postulated above exists: the Scan Statistic 
has been extensively studied by Parzen|T2|, Barton and David]]!!, Huntington and 



19 



Naus[n|, Neff and Naus||15||, Naus[|16[1, Glaz[l7|, Wallenstein, Naus and Glaz[[ 
Chen and Glaz[[2(J and Mansson pl| . It is a statistic for detecting clustering in time or 
one dimension in space. It is usually described as the maximum (or minimum) number 
of events which can be found in a window of fixed duration scanning smoothly through 
a much longer interval containing discrete events following some random process, for 
example Poissonian. An example of the scan statistic is shown in figure 1 for window 
lengths of 1% and 10% of the duration of the data. The random test data has a constant 
mean rate except for the third quarter which has double the rate. 
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Figure 2. Example of the Scan Statistic for windows of 1% and 10% of the data 
duration 

The scan statistic S has a probability P(S) which depends on the rate of events, the 
duration and the width of the scanning window. Some exact solutions for the probability 
P(S) have been provided. One of them, by Huntington and Naus|L4|, provides the 
probability of a related statistic: 

P (a n < a) = 1 - Yl R det I det IVy I ( 4 ) 

Q 

where a n is the smallest interval a containing n events in the range [0, 1], where this 
range contains N events in all. The summation extends over the set Q of all partitions 
of N into 2L + 1 integers satisfying Ui + n i+ i < n, % = 1, . . . , 2L and R = N\b M \a — b) N ~ M 
with M = ^fc=o ra 2fc+i, and 

2i-l 

hij = nk — (i — j)n L + l>i>j>l 

k=2j-l 
2j-2 

= - Yl nk + (i - l<i <j <L+1 

k=2i 
2i 

■ J2 Uk 

k=2j 



I; 



(i — j)n L > i > j > 1 
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n 


exact 


Newell-Ikeda 


5 


0.711 


0.984 


6 


0.225 


0.565 


7 


0.0425 


0.129 


8 


6.31 x 10~ 3 


0.0196 


9 


8.04 x 10~ 4 


2.48 x 10~ 3 


10 


9.05 x 10" 5 


2.76 x 10" 4 



Table 1. Comparison of Scan Statistic probabilities 



2j-l 

£ 

k=2i+l 



nk + (j 



i)n 



l<i<j<L 



Equation |], although exact, is computationally expensive for large N and small a, that 
is a large data set with a small scanning window, but several approximations have been 
provided which are designed to be valid for certain combinations of parameters. 



3.2.2. Newell-Ikeda Approximation for the Scan Statistic The Newell- Ikeda||2~2|, [2~3f 



asymptotic formula is suitable for small probabilities. It gives the probability of finding 
a section of length t in a data set of length T, given a Poisson process of average rate A: 

P (n; AT, t/T) = 1 - exp (-X n t n ~ x T /(n - 1)!) (5) 

As shown in table [I], it significantly overestimates larger probabilities. Better 
approximations, although not as easy to calculate, are available, for example Conover, 



Bement and Iman ¥2M and Nausfl25 . 



3.2.3. Naus Approximation for the Scan Statistic The more exact treatment of 



Naus|^5| will be given without derivation. For an average rate of events A, data of total 
duration T and scanning window of duration t, define L = T/t. Then the probability 
that the number of events in a scanning window never exceeds n is Q* (n; XL, 1/L) and 
is accurately approximated by: 

L-2 



Q* (n;XL, l/L) = Q*[n;2X,- 



Q* 



n;3A,ij/Q* (n; 2A, l - 



(6) 



Note that this approximation is valid for a wide range of types of distribution for the 
time between events. Exact formulae for Q* (n; 2A, |J and Q* (n; 3A, |) are given for a 
Poisson process: 

2 
1' 



Q* (n;2A, 
Q* ( n\ 3A 



5 g / r n-3 



F n _ x - (n - 1) PnPn-2 — (n — 1 — A) PnK-3 
A x + A 2 + A 3 - A 4 
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where 



A 1 = 2p n F n _ 1 {{n - 1) F„_ 2 - AF n _ 3 ) 

A 2 = 0.5p 2 n ((n - 1) (n - 2) F n _ 3 - 2 (n - 2) AF n _ 4 + A 2 F n _ 5 ) 



n-1 



-43 = y^P2n-ri 71 r -l 



r=l 



A 4 = y^p 2 n-rPr ((>" ~ 1) ^r-2 ~ AF r _ 3 ) 



r=2 

and Pi,F n are the Poisson probability and distribution functions: pi = e~ xt (\t) 1 /i\ 
and F n = YZ^Pi- 

Tight bounds for Q*(n) have been given by Glaz and NauspTJ and a recursive 
method proposed ]27j for calculating Q*(n;2t) and Q*(n;3t) for situations where the 
random quantity X\ may take on values other than 0, 1, that is situations where an 
'event' cannot be given as either present or absent but only with a non-zero probability. 

Other approximations for the tail of the scan statistics and the moments of its 



distribution have been given by Glaz [|I7J] and Chen and Glaz [20]. Sample tables of the 
scan statistic have been given for n < 500 by Glazp8|, |T7|j . 

This treatment of the scan statistic is for an interval of length t, specified in advance. 
When searching for a 'burst' of events, an a priori length cannot always be specified. 
An extension to the treatment above has been described by Nargawallap9| in which the 
length need not be pre-assigned. 



3.2.4- Aim Approximation for the Scan Statistic A new approximation has been given 
recently by Alm[Q which is accurate and easy to calculate for large values of T/t and 
At. This treatment examines the distribution of upcrossings, that is occurrences where 
the number of events in the scanning window increases by 1 as the window is moved. By 
separating these events into primary and secondary upcrossings, the dependence of the 
second type from the first (almost) independent events allows significant simplifications. 
If each window of length t were independent, the expected number of events would be 
At with a Poisson probability function F\ t (n) and distribution function p\t{n). The 
approximation based on the ideas above gives the simple modification: 

At 



P(N > n) = 1 - F xt (n)exp 



n + 1 



\(T-t)p Xt (n) 



(7) 



Equation [7] has been tested for At = 40 and T/t = 3600 using 10000 Monte Carlo 
simulations. The results are shown in table ||] for 13 < n < 23. It can be seen that 
equation ^ is a good approximation within the sampling errors. 



3.2.5. 



Other Approximations for the Scan Statistic Other methods have been 



published, for example the Burst Expectation Search by Giles|31] and CUSUM by 



VanStekelenborg and Petrakis[32|. The first follows earlier work[33] which used binned 



The Analysis of Cosmic Ray Data 



9 



n 


Monte Carlo 


Equation [7| 


13 


0.9984 


0.9993 


15 


0.9429 


0.9478 


16 


0.6654 


0.6635 


17 


0.3094 


0.3087 


18 


0.1055 


0.1098 


19 


0.0307 


0.0337 


20 


0.0093 


0.0095 


21 


0.0021 


0.0025 


22 


0.0006 


0.0006 


23 


0.0002 


0.0001 



Table 2. Comparison of 1-D Scan Statistic probabilities for 10000 samples with 
T/t = 3600 and Xt = 40 



times of events and calculates Poisson probabilities of bin counts from a running average 
of a sample of bins. The BES inverts this process and, for each possible bin count from 
zero to several hundreds, calculates the mean rate below which the possible count could 
be a significant burst using a fixed sample of bins around the trial bin. The aim of 
keeping a fixed sample was to avoid problems arising from a step function edge entering 
a moving average. 



3.2.6. Bursts: Summary In summary, of possible methods suggested for searching for 
bursts using classical statistics, the Scan Statistic is recommended, both for time-tagged 
data and for time-binned data. For small data sets or large window sizes, equation f| 
provides an exact probability of the largest number in any window arising due to chance. 
Many approximate formulae are available, depending on whether the probability of the 
scan statistic is expected to be large or small. In most practical cases in cosmic rays 
the statistic is used to search for a possible outburst and so the probability of a given 
value of the scan statistic arising due to chance will be small in order to be useful. It 
becomes a matter of computational convenience which of the formulae above is used but 
equation [j] delivers a good approximation over a wide range of probability values and is 
easy to calculate. It also has the advantage, in terms of understanding the principles, of 
starting from the naive initial Poissonian formula with non-overlapping (independent) 
windows. Its use is therefore recommended here. 



3.3. Periodicity 

Most of the methods for time-series analysis, including trend analysis and auto-regressive 
moving average (ARMA), have been developed for fields other than cosmic rays, for 
example @, |35|, [36| . Fourier methods suitable for data at equally spaced times are well 
developed but are usually not suitable, although these have been extended to discuss 
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unequal intervals and missing data||37|| . A bibliography of astronomical time series 



analysis has been given by KoenpS]. 



3.3.1. The Rayleigh test and Dependants The spur for the introduction of the Rayleigh 
test into 7-ray astronomy was the unsatisfactory nature of the statistics being used 
before. Early tests on 7-ray data used epoch-folding to produce a histogram in phase, 
and x 2 as a statistic for goodness of fit to a uniform distribution. This suffers from 
several disadvantages: 

(i) the freedom to select the number of bins, 

(ii) the freedom to define the starting phase, 

(iii) the failure to use the information contained in the order of the bins. 

This last problem can be overcome to some degree by using the Run Test, which is 
independent and therefore whose probability may be combined with that from x 2 - The 
result of the freedoms listed above is that different authors could return quite different 
chance probabilities, given the same data, despite using the same test statistic. The 



analysis of 7-ray data from the Crab pulsar by Gibson et al.|39[ contained the first 
known use of the Rayleigh statistic @, |4]J in cosmic ray work. It is still a goodness-of- 
fit statistic, which has no explicit hypothesis as an alternative to the null hypothesis. 

The time of each event is treated as a unit vector in the plane, with an angle equal 
to the pulsar phase. If N unit vectors of random orientations (random phases) are 
added, the distribution of the resultant R may be obtained from the distribution of the 
orthogonal components of the vectors, sin0j , cos0j where <pi is the phase of the i 
vector. The means of these components are : 

l N l N 

i=l i=l 

From the Central Limit Theorem (CLT) means of samples of C are distributed, for 
large N, as a Gaussian with varc = a c /N. For vectors uniformly covering the circle: 

i-2-k 

a%= I cos 2 0)#/27r = 0.5 



therefore varc = var$ = 1/2N. The quantities C and S are asymptotically uncorrelated 
and have zero means. The statistic 2NR 2 = 2NC 2 + 2NS 2 is therefore the sum of the 
squares of two zero-mean, unit-variance uncorrelated variables and is distributed as \ 2 



with 2 degrees of freedom |42], [43|]. The probability distribution function (pdf) of R is : 
f{R)dR = 2NRe~ NR2 dR (8) 

and its cumulative probability distribution is : 

F(R) = e~ NR2 (9) 

The quantity NR 2 is known as the Rayleigh power. 

If a data set spans a time interval T the number of independent frequency trials 
in the frequency range fi to fi is v = T (/1 — / 2 ) if T >> /i,/2, with the independent 
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frequencies separated by 1/T. In practice allowance must be made for leakage: the 
possible effect of a signal at frequency f on trial frequencies / with | / — / |> 1/T, 
and oversampling: the possibility of obtaining a larger value of NR 2 by varying the 
frequency between adjacent independent frequencies. This has been done using by de 
Jager et al.||44|| by Monte Carlo techniques and analytically by Orford|45|]. Both methods 



agree that the number of trials is nT (fi — / 2 ) where n is a slowly varying function of 
F(R) in the range 2 to 4, with a value approximately 3 for F(R) ~ 10 -3 . 

3.3.2. The Z\ Test The Z\ test is the extension of the Rayleigh test to include 
harmonics. IF n separate harmonics are included with independent coefficients, the 
statistic is 

■n 

Z 2 n = 2NY,R 2 (^) (10) 

i=l 

where 2NR 2 (iu) is the Rayleigh power for the i th harmonic. Z\ is distributed as x 2 {2n). 
Variations on this technique depend on the method used to select the number and 
weighting of harmonics. A similar principle is used in radio astronomy where a pulse 
of width W is searched for using P/2W harmonics which improves the signal to noise 
by a factor of up to (P/2W / )°' 5 |4^|. A search for 7-ray emission from radio pulsars 
proposed the use of Z\ as a relatively powerful but general test for periodicity^]. The 



power of the Rayleigh test for light curves from sinusoids to 5-functions was explored 
by Protheroe EBJ. A variant of Z\ is the H-test[44[ in which the value of n is obtained 



objectively from the data and Z\ is suitably rescaled. This last test is most suitable for 
multi-mode light curves. 

3.3.3. Limitations of the Rayleigh and associated statistics The foregoing results for 
the Rayleigh (Z 2 ) and Z 2 >1 tests are for the asymptotic case, that is: uncorrelated C 
and S with zero means. In most practical applications, these conditions are not strictly 
met. Ground-based gamma-ray observations of long-period pulsars are limited by: 

(i) being only a few hours in duration and 

(ii) variations in zenith angle, producing changing counting rates. 

The requirement for large sample size is usually met - typical counting rates are ~ 1 
per second over several hours. The result is an enhancement of x 2 i n pure noise data 
for longer test periods - red noise. The first limitation listed above may be overcome 
by truncating the dataset so that only integral multiples of the trial period are tested 



- see Carraminana et al.|5(J and Raubenheimer and Ogelman|]5T]]. As a result, the two 
trigonometric terms have zero expectations, given a constant mean counting rate. This 
truncation is easy to accomplish, but results in a variable data selection depending 
on the test period and therefore all periods are not accorded the same treatment. 
Since the periodogram is the convolution of the power spectral density with the Fourier 
transform of the data window, any spectral estimate based on a truncated data set is 
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biased |52|, |42| , |53f . Further, any correlation introduced by the second limitation above 
will not be removed this way. An attempt to remove the results of the counting rate 
variation has been made by Raubenheimer et al.[54] by fitting a parameter a in an ad 
hoc modification of the Rayleigh probability distribution: 

F(R) = e ~ 2aNR2 (11) 

to random data sets containing no signal, but with the same parameters as the test data 
set. For data taken on Vela X-l (period « 5 minutes) they found that equation |Tl~l with 
a = 0.4 (as opposed to 0.5 from simple theory) gave a probability distribution which was 
a good fit to the distribution in \ 2 f° r noise at periods near to 5 minutes in simulated 
data sets. 



3.3.4- Modified Rayleigh Statistic If the expectations of C and S, their variances and 
their covariance are not assumed to be zero, ^ and zero respectively, but are calculated 
for a specific dataset, then the asymptotic probability equation |9| may be valid, given a 
sufficiently large number of events [Q. 

The expression for x 2 m the case of samples of two correlated variables C and S is 



X 



c- 


E{C) ' 


T 




cov s ,c 


-1 


" c 


-E{C) ' 


s- 


E{S) _ 




cov s ,c 


4 




s 


-E{S) 



(12) 



For any data set, the substitution of the actual values of E(C), E(S), ac, cr 5 and 
covs,c w iH result in a value of x 2 corrected for the correlation of the variables C and S 
and with a probability distribution, for large sample size, given by exp(— x 2 /2). 

In the case of a box-car data set with a constant average counting rate, a starting 
time ti and ending time t-i with T = t2 — 1± and a trial period P = 2tt/uj : 



E{C) 
E(S) 
Nvar c 
Nvars 
Ncovs. 



sin 



[- cos ut]£ 



ujT 



t-2 



[sino;i cos uot] t 2 



2ujT 



[sin ut cos ijjt] * 



c 



2uT 



2uT 

*2 



- \E{C)] 2 

- \E(S)} 2 



E{C)E{S) 



These depend solely on u>, t% and t 2 and their substitution in equation |T2| gives a \ 2 value 
corrected for the finite length of the data set. If it is known that there is no secular change 
in counting rate the substitution of the above equations into equation [T2| would give the 



correct formal probability of chance occurrence, even if the duration of the data set is 
less than the trial period, as long as the number of events was high enough for the CLT 
to be valid. It is more usually the case in ground-based gamma-ray observations that the 
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box-car function is only an approximation. Monte Carlo simulations of data sets have 
been carried out to test the validity of data set truncation and the above formulations 
for the case of secular variations of counting rate superimposed on noise. In order to 
test the validity of the probability distribution equation ^5] down to probabilities ~ 1CT 6 , 
data sets were generated using a multiplicative congruential algorithm with shuffling, 
chosen to avoid serial correlations. The repeat period is longer than 2 x 10 18 . The time 
of each event tj was generated from the previous event: tj = £j_i — A(t) ln(rnd) where 
A(t) is the mean separation of events as a function of time. A group of 10 6 data sets of 
duration 8000 s were simulated with a counting rate profile R = Ro(l — 0.3t/4000) and 
Rq = Is" 1 . Each data set was tested for periodicity at a trial period of 295s by finding 
C and S with reference to the time of the first event. These values were substituted 
into equation [T2| for various assumptions about the form of E(C), E(S), varc, var$ and 
covs,c- The probability of chance occurrence was calculated from e"* 2 / 2 . 

The resulting cumulative frequency distributions for e _x2//2 have been calculated 
for the cases of (a) a truncation of the data set to integral multiple of the trial period, 
(b) box-car function and (c) a linear fit to the counting rate profile. The ratios of the 
observed to expected frequencies of occurrence of x 2 chance probabilities is shown in 
figure |3|, as functions of log (x 1 probability). Note that the duration of the data set is 
corrected for equally well by (b) and (c). 

3.5 -n 

- - - Truncated Data Set - - 

— — Boxcar Correction - y 
3- / 

Linear Fit Correction 

2.5- ^ ■■ ' 

Ratio 

observed/ , 
expected ^ ' 

/ 

15- ^ ' 

0.5 1 1 1 1 1 

-1 -2 -3 -4 -5 

log l0 ( >£ probability) 

Figure 3. Ratio of cumulative frequency of \ 2 probabilities to expectation 

The boxcar and truncated statistics both make corrections for the finite length of 
a data set, but give a residue which may be identified as being caused by the change 
of counting rate during the trial period. Longer trial periods or greater rates of change 
in counting rate would amplify their biases. The truncation method has a distribution 
which may be fitted, for this simulated data set, by a form such as equation |ll| with 
a = 0.45. The linear fit model is seen to be a good representation of the noise spectrum 
down to chance probabilities of ~ 10 -5 . 
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3.3.5. Other Tests Leahy et al.[|56| pointed out that the unmodified Rayleigh test 
was powerful for detecting wide peaks in a light curve, in fact it is identical with a 
likelihood ratio test of a sinewave plus uniform against a uniform phase distribution 
f5?fl . In addition, for a light curve of a von Mises form (the circular generalisation of 
the Gaussian), the Rayleigh statistic exhausts the data's information on periodicity if 
the concentration parameter k is allowed to vary freely . 

Narrow periodic pulse detection, with significant power in the higher harmonics, is 
bound to be quite difficult because the number of degrees of freedom increases with the 
trial frequency range. Protheroe[|!| proposed a test statistic 

„ n— 1 n 

T n = ~ rr £ £ (Aij + 1/n)- 1 (13) 

n(n — 1 

v ' i=l j=i+l 

which looked for close clustering of points on the circle. In this statistic Ay is the 
distance between the angles Xi and Xj of two events on the circle: 

A^ = 0.5- | [| {xi - X j) | -0.5] | 

The null distribution was found using Monte Carlo methods for n < 200 and critical 
values given. The context of the test was the search for ultra-high energy 7-rays from 
Cygnus X-3 and was therefore not designed for large n. In this limitation it is similar 
to the exact expression scan statistic described above. Others have suggested variants 



which are designed to be powerful for certain classes of pulsed emission |TT| . 

The Scan Statistic may also be used for searching for non-uniformity in phase. For 
narrow windows its probability distribution is well approximated by the scan statistic 
on the line[|60]. No systematic work on the use of the Scan Statistic in periodic analysis 



has been traced. 

3. 3. 6. Searching for a Periodicity It is usually only the case that a unique periodic 
ephemeris is available for high energy photons from an isolated radio pulsar. In other 
cases, a search must be made in period, and the test used must allow for the freedoms 
associated with the trial period range. A rule-of-thumb arising from the number of 
'degrees of freedom' implicit in a periodicity search using the Rayleigh test |44], [45| is 
that the search should be at intervals in period of (IF I)/ 3 (IF I = Independent Fourier 
Interval). For a period of P in a data set of duration T this corresponds to a trial 
period step of ~ P 2 /3T. This step size has the advantage that the number of degrees of 
freedom to be used to interpret the peak periodic amplitude found is approximately the 
number of periods tried. If harmonics of the test period are to be included, the spacing 
would be correspondingly reduced and hence the number of trial periods increased. For 
the Z\ test, the reduction in period step, and consequent increase in both computation 
and the degrees of freedom to be accounted for, is by a factor n. 

When searching for pulsed emission from some sources, in particular binary sources, 
there is frequently poor knowledge of the both the pulsar period and period derivative. 
In this case the light curve will be narrow only if the correct period P and period 
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derivative P is offered to the test. A nearby, but not correct, trial period and the 
ignorance of a period derivative will smear the light curve. If a true light curve were 
a 5 function at period P = P a and the trial period was P = P a + A, the light curve 
would be a rectangular distribution in phase of length TA/P^. This effectively limits 
the number of harmonics which may be realistically added to P^/TA — 1. 

Some searches for periodicity are combined with a search for a DC excess. This 
is common in Cerenkov telescope searches where ON-source data is compared with 
OFF-source control data to detect any DC component. The combined analysis of this 



situation was proposed by Lewis |63j in which a statistic a is defined as the sum of the 
Rayleigh statistic and the square of equation [[], distributed as x 2 (3)- The assumption 
in this case is that all of the excess is pulsed; if there is an unpulsed component the test 
statistic will be biased. Again, the presence of a possible unpulsed component could be 
built into a Bayesian analysis. 

3. 3. 7. Conclusion on Periodicity The question of the best classical test for the presence 
of periodicity is a complex one. The selection of the most sensitive test requires a 
knowledge of 

• the pulsar ephemeris 

• the light curve shape 

• the background noise distribution 

If all of these are known in advance, a most powerful test, based on the Z\ extension of 
the Rayleigh test is likely to be close to optimum. Frequently some or all of these will 
be unknown or poorly known. In this case, some allowance must be made for the lack 
of knowledge and the test selected should not contain any assumption which causes a 
significant bias. It has sometimes been claimed that the Rayleigh test is 'biased' towards 
broad light curves and that a test which is more sensitive to narrow light curves should 
be used when such a light curve is suspected. This raises the problem, discussed in the 
previous section, of the smearing of a light curve if the pulsar's ephemeris is uncertain. 



Protheroe has suggested [62] that if one has no information about the nature of the 
phase distribution one should be conservative and adopt the Rayleigh test. A rational for 
this is that if one is searching for an unknown period and an unknown light curve, which 
is quite common in 7-ray work, and there is no significant power in the fundamental, 
then a test involving the addition of an unknown number of higher harmonics is unlikely 
to be successful. This point will be revisited later in discussing a Bayesian method of 
searching for periodicity. 

A simple suggestion made before and reiterated here is that if Z\ does not show 
evidence for periodicity, that is: there is no significant power in the fundamental or the 
first harmonic, either in addition or separately, then it is unlikely that the data will 
contain a strong periodic signal. 
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4. Spatial Analysis 



4-1. Introduction 

Spatial analysis of arrival direction data is of great interest for X- and 7-rays from 
satellites and for cosmic rays of the highest energies, which may not be greatly deflected 
in the galactic magnetic field. 

Simple methods rely on a grid placed on the events and counts in the grid cells taken 
as independent Poisson-distributed events. If the cells are fixed absolutely, there is no 
problem in ascribing a suitable Poissonian probability to the largest number detected in 
any cell. If there is freedom to incrementally move the cell containing the largest count, a 
larger number is generally found. In this case the new cells created are correlated and the 
assumption of independence is incorrect: simple application of Poissonian probabilities 
is inappropriate. The problem of having the freedom to move the boundaries of the 



cells was pointed out for cosmic ray 'sources' by Hillas|65[ who suggested a conservative 
number of 'sigmas'. Large scale anisotropy in gamma ray bursts were sought using 
dipole and quadrapole analysis^]. A 'pair matching' statistic was used by Bennett 
and Rhie|67] to check for gamma ray burst repeaters rather than 'nearest neighbour' 
methods used by others|B^] and criticised by Nowak[B"5|. Many methods have been 
used which are based upon a known point-spread function (PSF). Amongst these are 
Maximum Entropy methods such as those used for satellite X-ray imaging|)4|, maximum 
likelihood [[7(J and Hough Transfer ms [71 . 



In the next section it is suggested that the scan statistic is a powerful and general 
statistic for which good approximations exist for the chance probabilities. It has 
recently been extended to two dimensions by Loader JT2|, Chen and Glaz|73[] and Alm|30 



Kulldorf[74 has extended this further to higher dimensional searches. 



4-2. 2- Dimensional Scan Statistic 

This is the two-dimensional development of the Scan Statistic introduced above. It 
will be introduced using a notion of 'elemental' cells from which a two-dimensional 
scanning window is constructed. In effect, the scanning window may be moved by 
discrete steps of the size of the elemental cell. Assume that a two-dimensional square 
region R = [0,L] x [0,L] of side L is inspected for occurrences of 'sources' [ff3| l. The 
region is partitioned into n x n elemental cells so that the size of a cell h = L/n. The 
contents of each of the n 2 cells are independent. 

For 1 < i < n and 1 < j < n, define a random variable Yij as the number of events 
in the elemental cell [(i — l)h,ih] x [(j — l)h,jh]. A square box ofmxm small cells is 
scanned over the whole of region R. There will be v such boxes, partially dependent if 
m > 2, with v = 0, (n — m + l) 2 . 

Define 

i'2+m—l ii+m— 1 

S(ii,ia)= E E F m 

j=i 2 i=i 1 
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Figure 4. Two-dimensional Scan. The region [0,L]X[0,L] [ s partitioned into n x n 
cells (n = 16) and an m x m window (m = 4) is scanned over it. 



to be the number of events in the square box of m 2 adjacent cells starting at i — 
j = i 2 . If, during the scanning of the m— box, S{ix,i 2 ) exceeds a particular value k, a 
'source' has been detected. For 1 < ii,i 2 < n — m + 1 define an 'event' A ilyi2 as an 
occurrence of S(ii,i 2 ) > k and as a member of the set A of all such occurrences. 
The two-dimensional scan statistic is defined as: 

S m = max {S(ii, i 2 ); 1 < i\ < n — m + 1, 1 < i 2 < n — m + 1} 

and the probability that S m has at least a value k is: 

f n— m+1 n—m+1 



P(S m >k) = P[ |J |J A ilM 



11=1 «2 = 1 

(n— m+1 n—m+1 
n n <, 
11=1 «2 = 1 

where v4^ iji2 is the occurrence of S(ii,i 2 ) < k. 

4-2.1. Glaz Approximation to 2-D Scan Statistic For a fixed value ofl < %\ < n — m + 1 
the one-dimensional approximation holds: 

'n-m+l \ / x n-2m 



12=1 

and since n — m + 1 square regions ofmxm are scanned, a reasonable approximation 
is||§: 

(\ (n— 2m+l)(n- m+1) 
(14) 
Q2m-1 J 

For a Poissonian distribution of events the following expression was found to be a good 
approximation |73| : 

P(S m >fc)«l- exp(-A*) (15) 
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where the approximate mean for the asymptotic Poisson distribution is 

A* = 1 - q 2m -2 + (n - 2m + 2)(n - m + \){q 2m -2 ~ qW-i) 

and 

g m+w = P(Al tl n i . . . n A^) 



Tables are given |fT5| of this and other approximations, for the Poisson model of m < 20 
and n < 500. 



4-2.2. Aim Approximation to 2-D Scan Statistic A recent paper [3(| has given an 
approximation based on a modification of the method of counting upcrossings used in 
equation which is easy to calculate and moreover is given for a more generally useful 
rectangular scanning window [0, a] x [0,6] in a rectangular region [0,S] x [0,T]. The 
scan statistic L is the maximum content of a scanning window with a two-dimensional 
Poissonian process X with event density A: 

L = L(X, a, b, S, T) = max X([t — a, t] x [s — b, s] 

a<t<T,b<s<S 

The probability of observing at least n events in a scanning window is: 

P(L > n) w 1 - F N(a) e-^ (16) 



where 



7n+1 w ( 1 - (T - a) 6A (> n - e"^ +1 

/'" ~ ( 1 - — J - b) p\ ab (n - 1) 



and 



F N (a) « F Aa6 exp 



Aa6 



Aa (5-6) PAa<»(w) 



n+ 1, 

p M and F M are the Poisson probability and cumulative probability distributions. 

The predictions of equation 16 have been compared with the results of 10 5 Monte 
Carlo simulations in table |]for S = T = 20, a = b = 2 and N = 800. The agreement is 
good, allowing for the errors inherent in the Monte Carlo results. For interest, the 
final column shows the Poissonian probability obtained if the cells were treated as 
independent. In this particular case, the result of assuming independence of the cells 
would be a fairly consistent overestimate of the significance of the 'source' by about 
'3cr'. The precise amount of underestimate of the chance probability will depend on the 
number of elemental cells in the scanning window. Finally, the treatment of |30| has 
been extended to other shapes of scanning window, such as circular. 

4-2.3. Summary In summary, the 2-D Scan Statistic is a preferred general statistic 
for those cases where events are located randomly on a plane, within fixed bounds, and 
where there is no a priori expectation such as a known source with known instrumental 
spread function. In most practical situations a good approximation is obtained by using 
equation [16. 
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n 


Monte Carlo 


Equation 16 


Poisson 


7 


0.9990 


0.9975 


3 x 10 -3 


8 


0.8658 


0.8795 


10 -3 


9 


0.4271 


0.4501 


2 x 10 -4 


10 


0.1351 


0.1346 


4 x 10~ 5 


11 


0.0326 


0.0304 


7 x 10~ 6 


12 


0.0089 


0.0059 


10~ 6 


13 


0.0021 


0.0011 


2 x 10~ 7 


14 


0.0005 


0.0002 


3 x 10" 8 



Table 3. Comparison of 2-D Scan Statistic probabilities for 100000 samples with 
N=800, S=T=40, a=b=2. Also shown are the simple Poissonian probabilities assuming 
independent cells 



5. Bayesian Methods 

5.1. Introduction 

For many workers in cosmic rays, Bayesian methods are relatively novel and the following 
section attempts to summarise the main ideas and methods. A much fuller development 



of the ideas discussed below is given by Loredo[75, 58, 76 



5.1.1. Statistics The term 'statistics' arises from the concept of a statistic. A statistic 
is a number derived from observed data and which obeys certain rules, some of 
which depend on a hypothesis about the system under observation, some of which are 
extraneous. From this number, one can say how likely it is that the data was drawn 
from a population obeying rules specified by the particular hypothesis, assuming that all 
extraneous quantities are allowed for. From that, by an inversion of logic, it is inferred 
how likely is the hypothesis. In many cases, a particular statistic is used because the 
experimental results appear to be presented, or may be rearranged to be presented, in 
a form which allows an easy calculation of that statistic. An example is the epoch- 
folding of time-tagged photon times above, followed by \ 2 calculated from the binned 
phases. As was pointed out in that example, some arbitrary choices had to be made 
which rendered the results unsatisfactory. Also, the aspects of the experimental data 
which were used to calculate a statistic may not be all that are available, or the most 
discriminating aspects. This should, with careful design, be evident from a consideration 
of the statistic's 'power' but not necessarily. It is the claim of Bayesians that such 
problems are inherent in 'classical' statistics and derive from a misunderstanding of the 
meaning of Probability. 



5.1.2. The Meaning of Probability There were at the beginnings of the subject, and 
still are, two schools of thought. The first school maintains that the term probability is 
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a statement of the frequency of occurrence of data, such as that taken in a very large 
number of repeats of an experiment, under the assumption that random factors are at 
work causing the possibility that the results could be different every time. Take, as an 
example, coin-tossing: the probability of heads is obviously 0.5 in a single toss. Everyone 
would agree that, assuming no trickery, an unbiased coin would land equally likely as 
'heads' or 'tails'. But there are forces at work which affect the way a coin would land - 
all amenable to analysis. In fact a coin-tossing machine could be made which obtained 
'heads' or 'tails' every time. We regard coin-tossing as a random activity only because 
we expect humans to apply unconscious variability to the force and direction of the flip 
which is much greater than that needed in the initial conditions to obtain one more 
extra turns before landing. This illustrates an important point: unless the hypothesis 
is clear and specifies all pertinent factors there could be an apparent randomness. That 
is not to say that true randomness does not exist, only that it is often used as an alibi 
for lack of knowledge or precision in stating the experimental conditions. 

An alternative definition of probability is 'a measure of belief in a certain hypothesis'. 
This is, and was, a much easier idea to grasp but one which was felt from an early 
date not to be capable of exact or scientific analysis. One consequence of this idea is 
that for a unique set of data, perhaps taken on a naturally-occurring phenomenon, the 
idea of a very large set of repeated experiments to plot out the 'frequency distribution 
necessary to use a 'statistic' was unrealistic. It is this definition which underlies Bayesian 
thought, and indeed is the definition which more closely accords with the questions 
for which measurements are made. Interestingly, this meaning of probability explains 
the frequency version as a special case using de Finetti's representation theorem for 
exchangeable sequences of events [|77[ . 

The main difference between the two philosophical approaches is how the data are 
related to the hypotheses. 

(i) The 'Frequentist' approach: We obtain P(D \ H), the conditional probability 
of obtaining the observed data, given a particular hypothesis. The hypothesis 
is frequently a model M which has a parameter space 9 and P(D | M, 8) is the 
'sampling distribution' for the data, given the model. A frequently met hypothesis is 
the 'null' hypothesis H in which the parameters are set to zero. For any hypothesis, 
a statistic is formed which is 'locally most powerful' or even better 'uniformly most 
powerful' and the probability of observing the data is assigned from a knowledge 
of the statistic's distribution function. This is now used to give a range of values 
in which the value of a statistic may fall by chance, with given probability (i.e. 
frequency). As an interesting aside, it is most usually the case for continuous 
measures, and frequently for discrete measures, for a range of values of the statistic, 
including that observed (but also including many values not observed), to be used 
to derive the probability. This is frequently performed by integrating P(D | M, 6) 
over the sample (data) space. That is to say, the probability of a hypothesis is 
determined by the data taken, plus a whole range of values of data which were 
not observed. This curious situation is not often questioned by ordinary users of 
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'statistics'. 

(ii) The 'Bayesian' approach: We obtain P(H | D), the probability of a hypothesis, 
given the data - apparently a more difficult matter. However Bayes Theorem gives: 

P(H | D)P(D) = P(D | H)P(H)\eadmg to 
P(H | D) = P(D | H)P(H)/P(D) 

P(D | H) is the likelihood function, P{D) is the global likelihood, usually 
treated as an ignorable normalising constant, P{H) is the 'prior' probability of 
the hypothesis. In addition to the extra terms not used in frequentist analysis, a 
crucial difference between the approaches is that Bayesian methods would integrate 
the likelihood function over the parameters space, rather than the sampling (data) 
space. Bayesian methods have been criticised for the inclusion of an apparently 
subjective quantity P{H) but a trivial example demonstrates that frequentist 
analyses are not free from this. Frequentists would determine if the hypothesis H of 
a histogram having all the cells identical were true by taking x 2 as a statistic. They 
would use no prior information or knowledge. But we know that histogram cells 
cannot contain negative numbers, and so some relevant background information is 
ignored when using \ 2 - 

5.1.3. An Example An example of the different approaches is an experiment in which 
a coin is tossed N times. It lands heads H times. The question is: is it biased? 
In the Frequentist approach a hypothesis (the 'null' hypothesis) is formed that the 
coin is unbiased and that the result is a function of randomness only. A sufficiently 
low probability which is obtained for a suitable statistic would be evidence that the 
'null hypothesis' should be abandoned. The binomial distribution describes the result 
of such discrete, bounded experiments. The probability of H heads in N tosses is 
0.5^ x 0.5^"^ x C 1 ^. One then calculates the probability of obtaining 0,1,2, ... H — 1 
heads and add them to the probability of H heads and say: 'if the null hypothesis 
is true, getting H or fewer heads in N tosses can occur due to chance, with a given 
probability. This could be interpreted as some evidence against the 'null hypothesis', 
hence evidence that the coin is biased. 

5.1.4- Stopping Rules Setting aside for the moment the fact that we did not see 
0,1,2, ... H — 1 heads, the last conclusion supposes that the coin was tossed, irrespective 
of the result, N times and that the number of heads H was the random variable. But 
suppose the coin was actually tossed by a person until H heads were obtained, and 
that happened to occur after N tosses. In this case the number of tosses N is the 
random variable and the number of heads H is fixed. The probability is then derived 
by combining the individual probabilities of obtaining H heads from h > H tosses, and 
may be significantly different from the first probability calculated. 

Loredo[]58| gives a more apposite example: a theorist predicts that / = 10% of the 
stars in a cluster should be of type A. An observer reports 5 stars of type A out of 96 
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observed. The theorist calculates as follows: N = 96 and / = 0.1 gives 9.6 predicted 
type A stars. The probability of the value of x 2 — 2.45 of this information is P = 0.073, 
which is acceptable at the 95% level. The observer however decided in advance to stop 
when he had found 5 stars of type A. The expected value of N is then 5/0.1 = 50 with 
variance 5(1 — /)/ f 2 = 450. The probability of the value of x 2 — 4.7 of this information 
is P = 0.032, which is not acceptable at the 95% level. This ambiguity arises because 
of the stopping rule used by the experimenter that is - what data sets might have been 
observed. 

The stopping rule can therefore be important in classical statistical analysis, and 
ignorance of the actual rule used may lead to an erroneous or at least ambiguous 
conclusion. Knowledge of the exact stopping rule is less important in Bayesian analysis, 
but is valuable in particular when it contains useful information about the unknown 
quantities. In other cases, the stopping rule could be important if the existence of some 
data is unknown to the analyser, perhaps because its analysis did not show it to be 
significant and it was suppressed by the experimenter. The message from this example 
is that for Frequentist analysis to be possible, an experiment must be precisely defined 
and if the execution is different in any way from the plan, the data could be worthless. 

5.1.5. Conclusion In summary, frequentist methods establish P(D | 9M) as the 
sampling distribution of the data, given a model M with parameters 9 and perform 
integration over the data space. Bayesian methods start with the same function 
P(D | 0M) but treat it is a likelihood with integrations performed over the parameter 
space of the model. In particular, parameters which are necessary for the specification 
of the model but are not of interest (for example the phase when looking for a periodic 
signal) are integrated out, or marginalised. 

In Bayesian theory, the notion of a 'random variable' is absent so ambiguity does 
not arise for many types of stopping rule and there is no need for a 'reference set' of 
hypothetical data. This state of affairs results from the need in Bayesian methods to 
be specific about all the hypotheses, or to integrate away any unspecifiable variable. 
Taking again the example of a histogram, and the question of whether its cells are 
consistent with uniformity using % 2 If the null hypothesis is the only hypothesis available, 
the use of x 2 is as a 'goodness-of-fit' test for the supposition of flatness. The number 
observed in the i th bin of n bins is X{. The number expected in each bin, under the 
null hypothesis, is avg = YH=i x il n an d X 2 — Y^=i( x i ~ av 9) 2 1 av 9i assuming avg is 
large enough (usually 10 or so) for asymptotic normality. The probability of x 2 f° r 
n — 1 degrees of freedom is interpreted as supporting or otherwise the null hypothesis. 
This statistic suffers from a major problem in that it ignores information - the order of 
the bins may be significant, and so it implicitly assumes a class of alternative models 
in which the order is unimportant. This can be partially rectified by applying an 
independent test which is only determined by the order of the bins - the Run Test. 
This is only applying a patch, since the Run test is most powerful against monotonicity 
and not other patterns. Frequentists acknowledge this problem in general by using the 
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idea of the power of a statistic, that is its ability to identify correctly a true model 
from a particular alternative. Both approaches have subjective factors: Bayesian in 
assigning prior probabilities to hypotheses, Frequentist in the notion of randomness and 
its applicability in a mathematical sense to cover for a lack of knowledge of the exact 
experimental conditions. A consequence of this is that different experts in both fields 
may come to different conclusions given the same data. Another way of putting this is 
that the result of analysing data will be a conclusion within a range, depending on (a) 
the Bayesian priors, or (b) the estimate of the degrees of freedom and unknown factors. 



5.2. Bayesian On/ Off Analysis 

The Bayesian ideas in the above section have recently been applied to the ON/OFF 
problem treated earlier. As in all Bayesian analyses, some judgement must be made of 
the priors to be used, but in the cases discussed here the results do not depend critically 
on how these priors are chosen. 

An initial Bayesian analysis of the problem of detecting a source in an ON/OFF 
counting experiment has been given by Loredo[[T5|]. Using the same notation as in the 
ON/OFF section above, the probability of the background rate b (the posterior density) 
from the OFF-source data is: 

p (b | Noff) = P (Noff I b) P ® 

P {Noff) 

The Poisson likelihood for Nqff is: 

p(N ff I b) = ±-L- ■ 

IVOFF- 

The parameter b is unknown and so the 'prior' probability would appear to be a matter 
of guesswork. If the range of b were pre-specified in some non-arbitrary way, at least 
the scale of b would be known, and a flat prior would be reasonable. If even the scale of 
b is unknown, the 'least informative' prior for b is p(b) = 1/6, which is uniform in log b, 
and then 

p(Noff) = Tt r / b N °^- l e- bT db 

l\OFF- Jo 

This leads to 

, T FF(bToFFf OPF - 1 e tT <>" 
P(6|AW) = fc^? 

Note that the expectation of the background b = Noff/T and that the assumption of 
p(b) = 1/6 does not strongly affect the result, Loredo pointing out that a prior uniform 
in b only marginally alters the expectation b = (Noff + ty/T. The joint probability of 
the background rate b and a source rate s, given Non and Noff, is: 

p (N ON I sb) 



p(sb | N n) =p(s\ b)p(b) 



V (Non) 
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The probability of the source rate s is obtained by marginalising b, that is p (s \ Non) 
Jp(sb\ Nqn) db: 



pis 



N ° N T ( qT Y- 1 v-sTon 

no,) = e ^'t'). — < 17 > 



where 



£Y_ _ - .1 / (7V OJV -i)! 



V^OiV fi _j_ 1\J (N ON +N OFF -j-l)\ 
Luj=\ V-T a ) (NoN-j)'- 

This result is formally correct for all positive values of Non and Noff and is particularly 
useful for small values when the asymptotic treatments fail. 

Its main value is to illustrate the completely different approach and result of the 
application of Bayesian ideas. However, there are some computational problems for 
values of Non and Noff which exceed ~ 100. For values of Non and Noff which are 
less than ~ 100 evaluation of equation ^| and equation [T7| shows small differences in the 
derived probabilities. 



5.3. Bayesian Change Point Analysis - Bursts 

A recent paper by Scargle|| has used Bayesian methods to analyse structure in photon 
counting data. It is worth noting that the ON/OFF problem dealt with above is a 
special case of change point analysis, where there is only one change point and its 
location is known in advance. The principles are the same as those outlined above, 
with the added simplicity of having simpler alternatives to the uniform model. The 
uniform counting rate model Mi assumes a constant intensity over a particular time 
interval T. An alternative model M2 has the interval T broken into two regions T\ and 
T 2 , T = T\ + T 2 , each with a different counting rate. In general, a model My. may be 
constructed with k regions. Bayes Theorem give the probability of a model 



p (M fc \D,I) = 



p(D 


M k ,I)p(M h 


I) 


p(D 


I) 



Dropping the explicit appearance of the background information /, the odds ratio Okj 
between two competing models and Mj is then 

p(M k I D) _ p(D I M k )p(M k ) 
p(Mj I D) p(D I Mj)p(Mj) 

The parameter 9 or vector of parameters 9 of the model M k enter when p(D | M k ) is 
calculated 

p(D I M k ) = J p(D I 9,M k )p(9\ M k )d9 

The odds ratio O k j is then 

p(M k I D) 
kj p(Mj I D) 
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_ p(M fc ) / P (D | 9,M k )p(9\ M k )d9 



p{Mj) J p (D | 9, Mj)p{9 | Mj)d9 
p(M k ) C(M k ,D) 



18) 



p(M 3 )C(M^D) 

where C(M k , D) is the global likelihood of model M k . 

For the constant-rate model M 1; N events arrive in a time T which is treated 
as being divided into M intervals of duration St, the justification being that photon 
counting apparatus always has a resolving time. Note that the number of events in any 
particular interval St can be or 1 only. The author shows that the global likelihood 
for this constant-rate model of Time Tagged Events (TTE) is 

£(Ml 1 TTE) ~ f(M + 2) 

If the data is time-binned into M equal bins, but such that any number of events may 
occur in any bin, given an overall rate A = N/T and mean number per bin of /i = XT/M, 
the global likelihood is: 

T(N + 1) 
C{M X | Binned) = ^ 

Note that the bins are fixed and may not be scanned to maximise L. 

The alternative model M k has a likelihood which is the product of the likelihoods 
of the individual constant-rate regions of T. For a two-rate model with the time of the 
change of rate being t cp 

C{M 2 \D)= Jdt.pjdA, JdA 2 p^(t^)xp[D 1 \M 1 (A 1 ,T 1 )}p(A 1 ) 
xp[D 2 \M 2 {A 2 ,T 2 )}p(A 2 ) 

where A = XSt, P(A) is the prior for the rate A and P cp is the prior for the change- 
point time t^. For time-tagged data with resolution St the integrals are sums and the 
change-point location is m cp St. Since the change-point can be tested only at the arrival 
time of a photon, the photon number of the change-point n cp is used as an index. The 
number of events in the first section, up to the change-point, is N\ = n cp , N 2 = N — N\ 
and Mi = m„ tp The global likelihood is then 

T(n cp + l)r(m n - n cp + 1) 



C(M 2 | D)=J2 



T{n cp + 2) 



T(N- n cp +l)T(m N _ ncp - (N- n cp ) + 1) A ^ 
r(AT-n cp +2) 

The paper p| gives a coding in a popular mathematical package to implement the 
above ideas. 



5. 4- Bayesian Periodicity Analysis 
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5.4-1- Introduction Frequentist statistical theory allows more than one test to be 
applied to any situation. Any statistic, or function of the data, may be defined and 
the 'best' is selected depending on its 'power' or likelihood of selecting the 'correct' 
hypothesis. One of the problems of the frequentist approach to looking for evidence of 
periodicity is that, in the absence of a specific light curve, the alternative hypothesis (to 
one of uniformity in the phase distribution) is unknown and the power of a statistical 
test is difficult to specify except for a narrow class of alternative light curves. The 
Rayleigh statistic, Zf, is powerful only for the fundamental period and is formally the 
most powerful test for alternatives to uniformity from the Von Mises distribution - the 
circular equivalent of the Gaussian on the line. The Z\ test allows the addition of 
n — 1 harmonics but needs a protocol to decide when to stop adding harmonics and 
therefore degrees of freedom (the H test mentioned above suggests such a protocol). 
Finally, Protheroe's test is powerful for very narrow light curves. Each could be tried in 
succession to look for evidence of periodicity, but a method which is indifferent to the 
shape of the light curve, without any penalty, would be of great advantage. 

5-4-2. Gregory & Loredo Method Such a method based on Bayesian analysis, is claimed 
by Gregory and Loredo |J75[ . The essence of the method is to compare a uniform model 
for the distribution in phase at a trial frequency with a periodic model. The great 
difference between this and other methods is how the periodic model is proposed and 
how the necessary uncertainties and their associated 'degrees of freedom' of classical 
theory are accounted for. In particular, since an arbitrary postulated light curve may 
be of any shape, the method automatically applies Ockham's razor, in that models 
with fewer variables are automatically favoured unless the evidence from the data more 
than compensates. More complicated light curves (not necessarily with small number of 
harmonics, a 5-function is uncomplicated in this context) are penalized for their greater 
complexity. 

Bayes Theorem is used to compare the probabilities of two parameterised models 
of the phase distribution. In the notations of the authors, the probability that a model 
M describes the data, given the data D and any background information I is 



The first term on the right, p(M | I), is the prior probability of the model M, which 
may seem to be subjective but may be estimated in some cases from the permissible 
range of the parameters. The numerator in the second term, p (D | M, I), is the sampling 
probability of the data D, or the likelihood of the model M. The denominator, p (D \ I), 
is the global likelihood of the entire class of models. If the model contains a parameter 
6, or in the case two or more parameters a vector 6, the likelihood of the model can be 
calculated: 



p(M \D,I)=p (M | I) 



p(D\M, I) 
P(D\I) 



(19) 
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For time-tagged photon data with N events detected over a time T, the probability 
of D for a particular rate model r(t) can be calculated. For the time T divided into 
very small intervals of length At, the probability of n events in At is: 

_ [r(t)At] n e~ r ^ At 

Pn I 

n\ 

If At is small enough for pi = 0,i > 2 then the sequence of T j At time samples will 
contain N containing one event and Q = T/At — N containing no event. The likelihood 
is then: 

N Q 

p(D\r,I) = Y[pi(ti)Y[po(t k ) 

i=l k=l 

Using po(t) = e~ r ^ At and pi(t) = r(t)Ate~ r ^ At the likelihood function is 

N+Q 

- r(t k )At 
k=i 

In the case of a periodic model, the non-uniformity in phase is characterised by 
the varying contents of the phase bins. Although the number of phase bins needed to 
detect any light curve and the origin of phase are unknowns, these will be marginalised 
or integrated out. If there are m phase bins the average rate A = ^ Sj=i r j an d the 
fraction of the total rate per period in phase bin j is fj = ^j. The likelihood function 
is shown to reduce to 

where u> is the postulated angular frequency, the starting phase, f the set of m values 
of fj and rij being the number of events occurring in bin j. 
The joint prior density for the parameters u, 0, A, f is 

p(u,(f),A,bff | M m )=p{u | M m )p{<f) | M m )p{A | M m )p(f | M m ) 

The prior densities are: 

(i) p(4> | M m ) = 1/27T, this assumes that any starting phase is equally likely, 

(ii) p(A | M m ) = 1/A max , this assumes that A does not change during the observation 
and any value of A from A = to A = A max is possible, 

(iii) p{ui | M m ) = uj\a.{uj hi /uji ), where [o;^,^] is a prior range for u, 

(iv) p(f) = (m-l)!* (l-Er=i/i)- 

The assignment of the priors of the models themselves is all that is needed before 
comparing the likelihoods of the models. The two models are equally likely a priori and 
so the prior likelihood of the non-periodic model (Mi), p(M\) = 1/2 and that for the 
periodic model (M m , m = 2, m max ), p(M m \ I) = l/2u where v = m max - I. 



N 



p(D\r,I) = At 



N 



n 



exp 



i=i 
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The final result for the odds O against a uniform model of phase and in favour of 
a periodic model with phase and period unknown (a common case) is: 



O 



1 



NUm-V 



ww duj 

UJ 



2tt 



rn 



N 



(20) 



2-Kuhiiujhi/uJio) (N + m - 1)! J Ulo luJ W m (uj,(f)) 

where W m (u, <ft) is the number of ways that the set of rij observed counts can be made 
by distributing iV counts in m bins: 



n 



m i 

3=i n r 



and rij, the number of events placed in the j th phase bin depends on u, <fi and m. 



If the period is known, this reduces to: 



OM 



1 jV!(m-l)! 
27ru (N + m- 1)! 



2tt 



d(f> 



m 



N 



(21) 



In order to illustrate the difference between the information available from this 
treatment and from the Rayleigh test, a data set has been generated containing time- 
tagged random events with a constant mean rate, plus a periodic component. The results 



are shown in figure |5.4.2| . A particular point to note is that although the Rayleigh power 
is always positive, even for pure noise, in the case of LOG(Bayesian odds), peaks do not 
become 'interesting' until they become positive. This is because the 'degrees of freedom' 
have been accounted for automatically and cause the offset seen in figure [5.4.2| so that 
peaks falling below log (Odds) = are just those expected from noise. It can be seen 
much more clearly in the Bayesian Odds diagram that there is only one significant peak, 
at the period simulated. 




Trial Period <Y> 



Figure 5. A comparison of Rayleigh Power and Bayesian Odds using random 
simulated data with a periodic signal at P = 8.2s 



The method has been used to detect a weak pulsar signal from SNR 0540-693 
in ROSAT data, which could not be detected using a standard FFT technique pD|. 
Moreover, the precision of determining the frequency was much higher for the Bayesian 
method than for x 2 using epoch-folding. The frequency precision of the latter is 
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determined mainly by the duration of the data and is not strongly influenced by the 
number of photons. Gregory and Loredo show||79|| that the Bayesian method obtains 
greater precision in parameter estimation with more photons. The method has also been 
used to detect 1600 day modulation in the long-term radio emission of an X-ray binary, 
with very non-uniformly sampled data and a Gaussian noise of unknown magnitude 
|8C|, |SH|. There has been a recent independent use of a Bayesian method to calculate the 
upper limit to a pulsed flux at a known period, independent of pulse width and pulse 
phase p2 . 



6. Conclusions 



The hope of this review is that the more commonly met data analysis problems may be 
approached by the cosmic ray worker with a more consistent and up to date approach. 
There have been a number of advances in recent years in the tools, and more importantly 
in the methods, available to cosmic ray experimenters to ensure that the maximum use is 
made of hard-won data. The traditional statistical methods have resulted in a measure 
of agreement on the 'correct' way to look for sources from ON/OFF data, change points 
(bursts) in 1- and 2-dimensions and in periodicity. The application of these methods 
requires care to ensure that the 'degrees of freedom' are kept under control and properly 
accounted for: many of the criticisms of claimed sources have been based on the latter. 

New Bayesian methods of testing hypotheses have recently been proposed. A 
central theme of these methods is that classical methods often cloak ignorance in a 
way which distorts the results. There are claimed to be significant benefits to the use 
of Bayesian methods which derive from the requirement to be absolutely specific about 
the hypotheses and the methodology of marginalising nuisance parameters. In contrast 
to classical statistical methods, where various statistics may be generated from the 
same data, each with different assumptions, degrees of freedom and power, Bayesian 
methods provide a framework for describing completely the data and allow the direct 
comparison of specified hypotheses. A practical result of the philosophical differences 
between the approaches is that, rather than relying on a relatively easy-to-use, pre- 
packaged test statistic, with the accompanying dangers of hidden degrees of freedom, a 
Bayesian method requires the data interpreter to model the hypotheses precisely. The 
obvious disadvantages of this are claimed to be more than compensated by the directness 
of the link between the hypotheses and the data. Bayesian methods may require some 
time to become accepted in the field, in that the methodologies and ideas have not 
traditionally been part of the training of physicists; indeed may not have been as useful 
if physicists' training in classical methods had been better. 
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